entropic mirror descent
Supplementary Material for Mixture weights optimisation for Alpha-Divergence Variational Inference Kamélia Daudel1,2, Randal Douc3
Assume that p and k are as in (A1). Then, the two following assertions hold. A.3 The case α < 1 for the Power Descent algorithm Let α = 1, η (0,1], κbe such that (α 1)κ 0and let the initial probability measure µ1 M1(T) be such that Ψα(µ1) < . A common way to approximate intractable integrals of the form (16) is to resort to Importance Sampling methods and in that case we are also interested in ensuring that the support of the variational approximation q Q (with q = µk in our case) is included in the support of p. Seeking to solve the Variational Inference optimation problem inf Dα(µK||P) for α < 1 enables this to happen, as opposed to the case α 1 for which the α-divergenve exhibits the so-called mode-seeking property [2, 3, 4]. As a whole, well-chosen samplers and variance reduction methods appear to be a necessity even in the case α = 1 so that the obtained Monte Carlo estimator of θ 7 bµ,α(θ)do not suffer from a too large variance.
Mixture weights optimisation for Alpha-Divergence Variational Inference
This paper focuses on α-divergence minimisation methods for Variational Inference. We consider the case where the posterior density is approximated by a mixture model and we investigate algorithms optimising the mixture weights of this mixture model by α-divergence minimisation, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all α = 1, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when α < 1. Since the α-divergence recovers the widely-used exclusive Kullback-Leibler when α 1, we then extend the Power Descent to the case α = 1 and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Rényi Descent, a novel algorithm for which we prove an O(1/N) convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Rényi Descent and we discuss the potential advantages of one algorithm over the other.
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias
Malitsky, Yura, Posch, Alexander
This paper focuses on applying entropic mirror descent to solve linear systems, where the main challenge for the convergence analysis stems from the unboundedness of the domain. To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes. Along the way, we strengthen the bound for $\ell_1$-norm implicit bias, obtain sublinear and linear convergence results, and generalize the convergence result to arbitrary convex $L$-smooth functions. We also propose an alternative method that avoids exponentiation, resembling the original Hadamard descent, but with provable convergence.